Search CORE

2 research outputs found

Single-View 3D Reconstruction of Animals

Author: Kim Angjoo
Publication venue
Publication date: 01/01/2017
Field of study

Humans have a remarkable ability to infer the 3D shape of objects from just a single image. Even for complex and non-rigid objects like people and animals, from just a single picture we can say much about its 3D shape, configuration and even the viewpoint that the photo was taken from. Today, the same cannot be said for computers – the existing solutions are limited, particularly for highly articulated and deformable objects. Hence, the purpose of this thesis is to develop methods for single-view 3D reconstruction of non-rigid objects, specifically for people and animals. Our goal is to recover a full 3D surface model of these objects from a single unconstrained image. The ability to do so, even with some user interaction, will have a profound impact in AR/VR and the entertainment industry. Immediate applications are virtual avatars and pets, virtual clothes fitting, immersive games, as well as applications in biology, neuroscience, ecology, and farming. However, this is a challenging problem because these objects can appear in many different forms. This thesis begins by providing the first fully automatic solution for recovering a 3D mesh of a human body from a single image. Our solution follows the classical paradigm of bottom-up estimation followed by top-down verification. The key is to solve for the mostly likely 3D model that explains the image observations by using powerful priors. The rest of the thesis explores how to extend a similar approach for other animals. Doing so reveals novel challenges whose common thread is the lack of specialized data. For solving the bottom-up estimation problem well, current methods rely on the availability of human supervision in the form of 2D part annotations. However, these annotations do not exist in the same scale for animals. We deal with this problem by means of data synthesis for the case of fine-grained categories such as bird species. There is also little work that systematically addresses the 3D scanning of animals, which almost all prior works require for learning a deformable 3D model. We propose a solution to learn a 3D deformable model from a set of annotated 2D images with a template 3D mesh and from a few set of 3D toy figurine scans. We show results on birds, house cats, horses, cows, dogs, big cats, and even hippos. This thesis makes steps towards a fully automatic system for single-view 3D reconstruction of animals. We hope this work inspires more future research in this direction

Digital Repository at the University of Maryland

Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping

Author: Chen Lawrence
Goldberg Ken
Kanazawa Angjoo
Kerr Justin
Kim Chung Min
Rashid Adam
Sharma Satvik
Publication venue
Publication date: 18/09/2023
Field of study

Grasping objects by a specific part is often crucial for safety and for executing downstream tasks. Yet, learning-based grasp planners lack this behavior unless they are trained on specific object part data, making it a significant challenge to scale object diversity. Instead, we propose LERF-TOGO, Language Embedded Radiance Fields for Task-Oriented Grasping of Objects, which uses vision-language models zero-shot to output a grasp distribution over an object given a natural language query. To accomplish this, we first reconstruct a LERF of the scene, which distills CLIP embeddings into a multi-scale 3D language field queryable with text. However, LERF has no sense of objectness, meaning its relevancy outputs often return incomplete activations over an object which are insufficient for subsequent part queries. LERF-TOGO mitigates this lack of spatial grouping by extracting a 3D object mask via DINO features and then conditionally querying LERF on this mask to obtain a semantic distribution over the object with which to rank grasps from an off-the-shelf grasp planner. We evaluate LERF-TOGO's ability to grasp task-oriented object parts on 31 different physical objects, and find it selects grasps on the correct part in 81% of all trials and grasps successfully in 69%. See the project website at: lerftogo.github.ioComment: See the project website at: lerftogo.github.i

arXiv.org e-Print Archive